Introduction to Time Series
\(\hspace{0.3cm}\) More articles: \(\hspace{0.1cm}\) Estadistica4all
\(\hspace{0.3cm}\) Author: \(\hspace{0.1cm}\) Fabio Scielzo Ortiz
\(\hspace{0.3cm}\) If you use this article, please quote it !!
\(\hspace{0.5cm}\) Scielzo Ortiz, F. (2023). Introduction to Time Series. http://estadistica4all.com/Articulos/Intervalos-de-confianza.html
It’s recommended to open the article on a computer or tablet.
1 Introduction to stochastic processes
1.1 Stochastic processes
Let \(\hspace{0.1cm}\mathcal{X}_t\hspace{0.1cm}\) be a random variable (r.v.), for each \(\hspace{0.1cm}t\in T\)
\(\hspace{0.25cm}\) A stochastic processes is a set of random variables \(\hspace{0.1cm}\left\lbrace \hspace{0.1cm} \mathcal{X}_t \hspace{0.1cm}:\hspace{0.1cm} t \in T \hspace{0.1cm}\right\rbrace\hspace{0.1cm}\) such that \(\hspace{0.1cm}\mathcal{X}_t \in S \subset \mathbb{R}\)
\(\hspace{0.25cm}\) where:
\(T\hspace{0.1cm}\) is called parameter space and is the set of indices of the random variables that define the stochastic process. \(\\[0.35cm]\)
\(S\hspace{0.1cm}\) is called states space and is the variation field of the random variables that define the stochastic process. \(\\[0.35cm]\)
We will say that \(\hspace{0.1cm}\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm} : \hspace{0.1cm} t \in T \hspace{0.1cm} \rbrace\hspace{0.15cm}\) is a stochastic process with parameter space \(\hspace{0.1cm}T\hspace{0.1cm}\) and states space \(\hspace{0.1cm}S\). \(\\[0.5cm]\)
Observation:
\(T\hspace{0.1cm}\) is generally interpreted as moments or periods of time, because one of the most important applications of stochastic processes is time series modeling.
Therefore:
\(X_t\hspace{0.1cm}\) is a random variable ussually used to model the state of a system at time moment \(\hspace{0.06cm}t\hspace{0.06cm}\), or to model a variable of interest at the moment or period \(\hspace{0.06cm}t\).
1.2 Discrete stochastic process
\(\hspace{0.25cm}\) \(\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}:\hspace{0.1cm} t \in T \hspace{0.1cm} \rbrace\hspace{0.15cm}\) is a discrete stochastic process if \(\hspace{0.15cm}T\subset \lbrace 0,1,2,... \rbrace\)
1.3 Continuous stochastic process
\(\hspace{0.25cm}\) \(\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}:\hspace{0.1cm} t \in T \hspace{0.1cm} \rbrace\hspace{0.15cm}\) is a continuous stochastic process if \(\hspace{0.15cm}T\subset [0, \infty)\)
1.4 Types of stochastic processes
1.4.1 Independent process
\(\hspace{0.25cm}\)\(\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}:\hspace{0.1cm} t \in T \hspace{0.1cm} \rbrace\hspace{0.1cm}\) is a independent stochastic process if the random variables that define the process are independent.
1.4.2 Markov process
\(\hspace{0.25cm}\) A discrete stochastic process \(\hspace{0.1cm}\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}/\hspace{0.1cm} t \in \lbrace 0,1,2,... \hspace{0.1cm} \rbrace\hspace{0.2cm}\) is a Markov process if: \(\\[0.15cm]\)
\[P(\mathcal{X}_{n+1} = x_{n+1}\hspace{0.15cm} |\hspace{0.15cm} \mathcal{X}_0 = x_0 ,..., \mathcal{X}_n =x_n) \hspace{0.1cm}=\hspace{0.1cm} P(\mathcal{X}_{n+1} = x_{n+1}\hspace{0.15cm} |\hspace{0.15cm} \mathcal{X}_n = x_n)\]
\(\hspace{0.25cm}\) where: \(\hspace{0.2cm} x_{t} \in S \hspace{0.2cm},\hspace{0.2cm} \forall\hspace{0.1cm} t \in \lbrace 0,1,...,n+1\rbrace\) \(\\[0.35cm]\)
This property is known as the memoryless Markov property. Because it implies that the future state of the system, \(\hspace{0.05cm}\mathcal{X}_{n+1}\hspace{0.05cm}\) , only depends on the present state \(x_n\) and does not depend on past states \(\hspace{0.05cm}x_0,...,x_{n- 1}\hspace{0.05cm}\).
1.4.3 Process of independent increments
\(\hspace{0.25cm}\) A continuous stochastic process \(\hspace{0.1cm}\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}/\hspace{0.1cm} t \geq 0 \rbrace\hspace{0.1cm}\) is a independent increments process if:
\(\hspace{0.25cm}\) For all set of times \(\hspace{0.1cm}t_1,t_2,t_3\geq 0\hspace{0.13cm}\) such that \(\hspace{0.1cm}t_1 < t_2 < t_3\)
\(\hspace{0.25cm}\) \(\mathcal{X}_{t_2} - \mathcal{X}_{t_1} \hspace{0.1cm}\) and \(\hspace{0.1cm} \mathcal{X}_{t_3} - \mathcal{X}_{t_2}\hspace{0.1cm}\) are independents.
This means that the displacements of the process in the time intervals \(\hspace{0.1cm}[t_1 , t_2) , [t_2 , t_3)\hspace{0.1cm}\) are independent of each other, for all \(\hspace{0.1cm }0 \leq t_1 < t_2 < t_3\).
1.4.4 Strictly stationary process
\(\hspace{0.25cm}\) A continuous stochastic process \(\hspace{0.1cm}\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}/\hspace{0.1cm} t \geq 0 \rbrace\hspace{0.2cm}\) is a strictly stationary process if:
\(\hspace{0.25cm}\) For all \(\hspace{0.05cm}t \geq 0\hspace{0.05cm}\) , the probability distribution of \(\hspace{0.05cm}\mathcal{X}_{t}\hspace{0.05cm}\) is the same as that of \(\hspace{0.05cm}\mathcal{X}_{t+h}\hspace{0.05cm}\) , for all \(\hspace{0.05cm}h>0\hspace{0.05cm}\).
Therefore, for all set of times \(\hspace{0.1cm}t_1 , t_2,...,t_n\) :
\(\hspace{0.2cm}(\mathcal{X}_{t_1}, \mathcal{X}_{t_2},\dots ,\mathcal{X}_{t_n} )\hspace{0.1cm}\) is identically distributed as \(\hspace{0.1cm}(\mathcal{X}_{t_1+h}, \mathcal{X}_{t_2+h},\dots ,\mathcal{X}_{t_n+h} )\)
1.4.5 Process with stationary increments
\(\hspace{0.25cm}\) A continuous stochastic process \(\hspace{0.1cm}\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}/\hspace{0.1cm} t \geq 0 \rbrace\hspace{0.1cm}\) is a process with stationary increments if:
\(\hspace{0.25cm}\) For all pair of times \(\hspace{0.1cm}t_1,t_2 > 0\hspace{0.1cm}\) such that \(\hspace{0.1cm}t_1 < t_2\)
\(\hspace{0.25cm}\) \(\mathcal{X}_{t_2} - \mathcal{X}_{t_1}\hspace{0.1cm}\) and \(\hspace{0.1cm}\mathcal{X}_{t_2 + h} - \mathcal{X}_{t_1 + h}\hspace{0.1cm}\) are independents, for any \(\hspace{0.1cm}h>0\)
1.4.6 Martingalas process
\(\hspace{0.25cm}\) A discrete stochastic process \(\hspace{0.1cm}\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}/\hspace{0.1cm} t \in \lbrace 0,1,2,... \hspace{0.1cm} \rbrace\hspace{0.1cm}\) is a Martingalas process if: \(\\[0.15cm]\)
\[E\left[\hspace{0.1cm}\mathcal{X}_{n+1} | X_0 = x_0 ,..., X_n = x_n\hspace{0.1cm} \right] \hspace{0.1cm} = \hspace{0.1cm} x_n\]
\(\hspace{0.25cm}\) where: \(\hspace{0.2cm} x_{t} \in S \hspace{0.2cm},\hspace{0.2cm} \forall\hspace{0.1cm} t \in \lbrace 0,1,...,n+1\rbrace\) \(\\[0.35cm]\)
This property is known as Martingalas property, and it implies that the expected value of the sistym in the future \(\hspace{0.1cm}n+1\hspace{0.1cm}\) is the value of the system in the present \(\hspace{0.1cm}x_n\). In mean the system doesn´t change of the state observed in the last moment.
This property is known as Martingale property, and it implies that the expected value of the sistym in the future \(\hspace{0.1cm}n+1\hspace{0.1cm}\) is the value of the system in the present \(\hspace {0.1cm}x_n\). So, in mean the system doesn´t change of the state observed in the last moment.
1.4.7 Levy process
\(\hspace{0.25cm}\) A continuous stochastic process \(\hspace{0.1cm}\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}/\hspace{0.1cm} t \geq 0 \rbrace\hspace{0.15cm}\) is a Levy process if is a process of independents and stationaries increments.
The Poisson and Brownian process are examples of Levy process.
1.4.8 Gaussian Process
\(\hspace{0.25cm}\) A continuous stochastic process \(\hspace{0.1cm}\lbrace \hspace{0.1cm} \mathcal{X}_t \in S \hspace{0.1cm}/\hspace{0.1cm} t \geq 0 \rbrace\hspace{0.15cm}\) is a Gaussian process if:
\(\hspace{0.25cm}\) For all set of times \(\hspace{0.1cm}t_1,...,t_n \geq 0\) : \(\\[0.12cm]\)
\[(\mathcal{X}_{t_1}, \mathcal{X}_{t_2},...,\mathcal{X}_{t_n}) \sim NM(\mu , \Sigma)\]
\(\hspace{0.25cm}\) where:
\(\hspace{0.3cm}\) \(NM(\mu , \Sigma)\hspace{0.1cm}\) denote the multivariate Normal probability distribution with mean vector \(\hspace{0.1cm}\mu\hspace{0.1cm}\) y covariance matrix \(\hspace{0.1cm}\Sigma\) . \(\\[0.2cm]\)
The dynamic phenomena that we observe in a time series can grouped into two classes:
- The first are those that take stable values in time around a constant level, without showing a long term increasing or decreasing trend. These processes are called stationary.
Examples of those are the average yearly temperatures in a region or the propotion of births corresponding to males.
- A second class of processes are the non-stationary processes, which are those that can show trend, seasonality and other evolutionary effects over time.
Examples of those are the yearly income of a country, company sales or energy demand. These are series that evolve over time with more or less stable trends.
In practice, the classification of a series as stationary or not depends on the period of observation, since the series can be stable in a short period and non-stationary in a longer one.
2 Time series
\(\hspace{0.2cm}\) Given a stochastic process \(\hspace{0.15cm} \mathcal{Y} \hspace{0.1cm}=\hspace{0.1cm} \Bigl( \hspace{0.06cm} \mathcal{Y}_t \hspace{0.12cm}: \hspace{0.12cm} t \in T=\lbrace 1,2,...,n \rbrace \hspace{0.06cm}\Bigl) \hspace{0.1cm} = \hspace{0.1cm}\Bigl( \hspace{0.06cm} \mathcal{Y}_1 , \mathcal{Y}_2 ,..., \mathcal{Y}_n \hspace{0.06cm}\Bigl) \hspace{0.05cm}\) .
\(\hspace{0.25cm}\) Given a sample of one observation \(\hspace{0.08cm}y_t\hspace{0.08cm}\) of each random variable \(\hspace{0.08cm}\mathcal{Y}_t\hspace{0.08cm}\) of the process, for \(\hspace{0.06cm}t \in T=\lbrace 1,2,...,n \rbrace\hspace{0.06cm}\). \(\\[0.3cm]\)
- \(\hspace{0.15cm} Y_t = \left( y_1, y_2, ...,y_n \right)^t \hspace{0.15cm}\) is a time series associated to the stochastic process \(\hspace{0.09cm}\mathcal{Y}\).
\(\hspace{0.25cm}\) where:
\(\hspace{0.35cm}\) \(y_t\hspace{0.06cm}\) is frequently interpreted as the value observed of the variable \(\hspace{0.06cm}\mathcal{Y}\hspace{0.06cm}\) at the time or period \(\hspace{0.06cm}t\). Hence the name time series. \(\\[0.15cm]\)
Observations:
\(y_t \in \mathbb{R}\hspace{0.08cm}\) is a realization of the random variable \(\hspace{0.08cm}\mathcal{Y}_t\) \(\\[0.35cm]\)
A time series is a realization of a stochastic process. The time series is considered a result or trajectory of the stochastic process. \(\\[0.35cm]\)
A time series can be defined as a vector of data points ordered in time. Where the data is equally spaced in time, namely, between each data point there is the same time space, such as a week, a month, a trimester, a quarter …
The process is characterized by the join probability distribution of the random variables \(\hspace{0.1cm} \mathcal{Y}_1 , \mathcal{Y}_2 ,..., \mathcal{Y}_k \hspace{0.1cm}\), namely, is characterized by the join density or probability function \(\hspace{0.08cm}f_{\hspace{0.08cm}\mathcal{Y}_1 , \mathcal{Y}_2 ,..., \mathcal{Y}_k}\)
This distribution is called finite-dimensional distribution of the process. We say that we know the probabilistic structure of the stochastic process when we know that join distribution, which determine the distribution of any subset of the variables and, in particular, the marginal distribution of each variable.
3 Mean function
\(\hspace{0.2cm}\) Given a stochastic process \(\hspace{0.15cm} \mathcal{Y} \hspace{0.1cm}=\hspace{0.1cm} \Bigl( \hspace{0.06cm} \mathcal{Y}_t \hspace{0.12cm}: \hspace{0.12cm} t \in T=\lbrace 1,2,...,n \rbrace \hspace{0.06cm}\Bigl) \hspace{0.1cm} = \hspace{0.1cm}\Bigl( \hspace{0.06cm} \mathcal{Y}_1 , \mathcal{Y}_2 ,..., \mathcal{Y}_n \hspace{0.06cm}\Bigl) \hspace{0.05cm}\) .
\(\hspace{0.2cm}\) Mean function \(\hspace{0.1cm} \mu_{\hspace{0.03cm}t} \hspace{0.1cm}\) of the process is defined as: \(\\[0.15cm]\)
\[\mu_{\hspace{0.03cm}t} = E\Bigl[\hspace{0.08cm}\mathcal{Y}_t\hspace{0.08cm}\Bigr]\]
\(\hspace{0.2cm}\) for \(\hspace{0.1cm} t \in \lbrace 1,2,...,k \rbrace .\\\)
Observations:
An important particular case, due to its simplicity, arises when all the variables have the same mean and thus the mean function is a constant. The realizations of the process show no trend and we say that the process is stable in the mean. \(\\[0.35cm]\)
If, on the contrary, the means change over time, the observations at different moments will reveal that change. \(\\[0.35cm]\)
On many occasions we only have one realization of the stochastic process and we have to deduce from that whether the mean function of the process is, or is not, constant over time.
4 Variance function
\(\hspace{0.2cm}\) Given a stochastic process \(\hspace{0.15cm} \mathcal{Y} \hspace{0.1cm}=\hspace{0.1cm} \Bigl( \hspace{0.06cm} \mathcal{Y}_t \hspace{0.12cm}: \hspace{0.12cm} t \in T=\lbrace 1,2,...,n \rbrace \hspace{0.06cm}\Bigl) \hspace{0.1cm} = \hspace{0.1cm}\Bigl( \hspace{0.06cm} \mathcal{Y}_1 , \mathcal{Y}_2 ,..., \mathcal{Y}_n \hspace{0.06cm}\Bigl) \hspace{0.05cm}\) .
\(\hspace{0.2cm}\) Variance function \(\hspace{0.1cm} \sigma^2_{\hspace{0.03cm}t} \hspace{0.1cm}\) of the process is defined as: \(\\[0.15cm]\)
\[\sigma^2_t = Var\Bigl[\hspace{0.08cm} \mathcal{Y}_t \hspace{0.08cm}\Bigr]\]
\(\hspace{0.2cm}\) for \(t \in \lbrace 1,2,...,k \rbrace \\\)
Observations:
We say that the process is stable in the variance if the variability is constant over time. \(\\[0.3cm]\)
A process can be stable in the mean but not in the variance and vice versa.
5 Autocovariance function
The structure of linear dependence between random variables is represented by the covariance and correlation functions.
\(\hspace{0.2cm}\) Given a stochastic process \(\hspace{0.15cm} \mathcal{Y} \hspace{0.1cm}=\hspace{0.1cm} \Bigl( \hspace{0.06cm} \mathcal{Y}_t \hspace{0.12cm}: \hspace{0.12cm} t \in T=\lbrace 1,2,...,n \rbrace \hspace{0.06cm}\Bigl) \hspace{0.1cm} = \hspace{0.1cm}\Bigl( \hspace{0.06cm} \mathcal{Y}_1 , \mathcal{Y}_2 ,..., \mathcal{Y}_n \hspace{0.06cm}\Bigl) \hspace{0.05cm}\) .
\(\hspace{0.2cm}\) The autocovariance function \(\hspace{0.08cm}\gamma_{\hspace{0.03cm}t \hspace{0.03cm},\hspace{0.03cm} t+h}\hspace{0.08cm}\) of the process is defined as: \(\\[0.25cm]\)
\[\gamma_{\hspace{0.03cm}t\hspace{0.03cm} ,\hspace{0.03cm} t+h} \hspace{0.08cm}=\hspace{0.08cm} Cov\left(\hspace{0.08cm}\mathcal{Y}_t \hspace{0.08cm} ,\hspace{0.08cm} \mathcal{Y}_{t+h}\hspace{0.08cm} \right) \hspace{0.08cm}=\hspace{0.08cm} E \hspace{0.08cm}\Bigl[\hspace{0.08cm} (\mathcal{Y}_t - \mu_{\hspace{0.03cm}t})\cdot (\mathcal{Y}_{t+h} - \mu_{\hspace{0.03cm}t+h}) \hspace{0.08cm} \Bigr] \\\]
\(\hspace{0.2cm}\) for \(\hspace{0.1cm}t \in \lbrace 1,2,...,k \rbrace\hspace{0.12cm}\) and \(\hspace{0.1cm} h\in \lbrace \pm 1, \pm 2,... \rbrace . \\\)
\(\hspace{0.2cm}\) In particular, we have :
\[\gamma_{\hspace{0.03cm}t\hspace{0.03cm} , \hspace{0.03cm}t} \hspace{0.05cm}=\hspace{0.05cm} \sigma_t^2 \\\]
The autocovariances have dimensions, the squares of the series, thus it is not advisable to use them for comparing series measured in different units.
6 Autocorrelation function
\(\hspace{0.2cm}\) Given a stochastic process \(\hspace{0.15cm} \mathcal{Y} \hspace{0.1cm}=\hspace{0.1cm} \Bigl( \hspace{0.06cm} \mathcal{Y}_t \hspace{0.12cm}: \hspace{0.12cm} t \in T=\lbrace 1,2,...,n \rbrace \hspace{0.06cm}\Bigl) \hspace{0.1cm} = \hspace{0.1cm}\Bigl( \hspace{0.06cm} \mathcal{Y}_1 , \mathcal{Y}_2 ,..., \mathcal{Y}_n \hspace{0.06cm}\Bigl) \hspace{0.05cm}\) .
\(\hspace{0.2cm}\) The autocorrelation function \(\hspace{0.08cm}\rho_{\hspace{0.05cm}t \hspace{0.05cm},\hspace{0.05cm} t+h}\hspace{0.08cm}\) of the process is defined as: \(\\[0.25cm]\)
\[\rho_{\hspace{0.05cm}t \hspace{0.05cm},\hspace{0.05cm} t+h\hspace{0.05cm}} =\hspace{0.05cm} \dfrac{\gamma_{\hspace{0.05cm}t\hspace{0.05cm} , \hspace{0.05cm}t+h\hspace{0.05cm}}}{\sqrt{\sigma_{\hspace{0.05cm}t}^2 \cdot \sigma_{\hspace{0.05cm}t+h}^2\hspace{0.08cm}}\hspace{0.08cm}} \\\]
\(\hspace{0.2cm}\) for \(\hspace{0.1cm}t \in \lbrace 1,2,...,k \rbrace\hspace{0.1cm}\) and \(\hspace{0.1cm} h\in \lbrace \pm 1, \pm 2,... \rbrace \\\)
\(\hspace{0.2cm}\) In particular, we have :
\[\rho_{\hspace{0.05cm}t\hspace{0.05cm} ,\hspace{0.05cm} t\hspace{0.05cm}} =\hspace{0.05cm} 1 \\\]
It is interesting to notice the differences between conditional distributions and the marginal distributions.
The marginal distribution of \(\hspace{0.08cm}\mathcal{Y}_t\hspace{0.08cm}\) represents what we know about a variable, without knowing anything about its trajectory until time \(\hspace{0.05cm}t\hspace{0.05cm}\).
The conditional distribution of \(\hspace{0.08cm}\mathcal{Y}_t\hspace{0.08cm}\) given \(\hspace{0.08cm}\mathcal{Y}_{t-1}\hspace{0.05cm}\),…,\(\hspace{0.05cm}\mathcal{Y}_{t-r}\hspace{0.08cm}\) represents what we know about a variable when we know the k previous values of the process.
In time series conditional distributions are of greater interest than marginal ones because they define the predictions that we can make about the future knowing the past.
7 Visualization of time series in Python
Throughout this article we will use a time series on sales of a company.
First of all, we load some of the libraries that we are to use:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as snssns.set_theme()
sns.set(rc={'figure.figsize':(20,9)})We load the data-set:
Time_Series_1 = pd.read_csv('Time_Series_1.csv')Time_Series_1| CODARTIC | CODIREGG | IMPLINEA | IMPVENTA | CODTAVEN | QCANTDEC | NUMTARJE | |
|---|---|---|---|---|---|---|---|
| 0 | 10254201003449 | 1 | 59.99 | 59.99 | 2022-06-21001069602336387 | 1.0 | 6.008330e+18 |
| 1 | 10234141001560 | 1 | 12.72 | 12.72 | 2021-08-03001002100821700 | 1.0 | NaN |
| 2 | 10865290000019 | 4 | 11.20 | 11.20 | 2022-08-21001009006714129 | 1.0 | 6.008330e+18 |
| 3 | 10004136025419 | 1 | 48.97 | 48.97 | 2022-07-09001003602372694 | 1.0 | NaN |
| 4 | 10073131011804 | 1 | 22.95 | 22.95 | 2022-05-14001091203649080 | 1.0 | 6.008330e+18 |
| … | … | … | … | … | … | … | … |
| 704350 | 10271412016641 | 1 | 18.90 | 18.90 | 2022-02-10001003601776985 | 1.0 | NaN |
| 704351 | 10411524000184 | 2 | -29.99 | -29.99 | 2022-06-02001009803937030 | -1.0 | NaN |
| 704352 | 10092532091505 | 1 | 75.00 | 75.00 | 2022-05-19001003602477163 | 1.0 | 6.008330e+18 |
| 704353 | 10805731000015 | 2 | -12.59 | -10.70 | 2022-02-17001009008963198 | -1.0 | 6.008330e+18 |
| 704354 | 10084472093096 | 1 | 6.95 | 6.95 | 2021-06-06001048902492799 | 1.0 | NaN |
704355 rows × 7 columns
The default periodicity of this time series is daily.
We can group a time series by different periods, as hours, days, weeks, months, quarter, years…
Concretely we will group this time series by day, week, month and quarter.
We can get date column as follows:
Time_Series_1['Fecha'] = Time_Series_1['CODTAVEN'].str[0:10]We have to convert date column to date format:
Time_Series_1['Fecha'] = pd.to_datetime(Time_Series_1['Fecha'])We can create the columns Day, Week, Month, Quarter and Year as follows:
Time_Series_1['Dia'] = Time_Series_1['Fecha'].dt.day
Time_Series_1['Semana'] = Time_Series_1['Fecha'].dt.week
Time_Series_1['Mes'] = Time_Series_1['Fecha'].dt.month
Time_Series_1['Trimestre'] = Time_Series_1['Fecha'].dt.quarter
Time_Series_1['Año'] = Time_Series_1['Fecha'].dt.yearWe select the columns with which we are going to work.
IMPVENTA will be the response variable, namely, the variable we want to predict.
Time_Series_1 = Time_Series_1.loc[: , ['Fecha', 'Dia', 'Semana', 'Mes', 'Trimestre', 'Año', 'IMPVENTA']]Time_Series_1| Fecha | Dia | Semana | Mes | Trimestre | Año | IMPVENTA | |
|---|---|---|---|---|---|---|---|
| 0 | 2022-06-21 | 21 | 25 | 6 | 2 | 2022 | 59.99 |
| 1 | 2021-08-03 | 3 | 31 | 8 | 3 | 2021 | 12.72 |
| 2 | 2022-08-21 | 21 | 33 | 8 | 3 | 2022 | 11.20 |
| 3 | 2022-07-09 | 9 | 27 | 7 | 3 | 2022 | 48.97 |
| 4 | 2022-05-14 | 14 | 19 | 5 | 2 | 2022 | 22.95 |
| … | … | … | … | … | … | … | … |
| 704350 | 2022-02-10 | 10 | 6 | 2 | 1 | 2022 | 18.90 |
| 704351 | 2022-06-02 | 2 | 22 | 6 | 2 | 2022 | -29.99 |
| 704352 | 2022-05-19 | 19 | 20 | 5 | 2 | 2022 | 75.00 |
| 704353 | 2022-02-17 | 17 | 7 | 2 | 1 | 2022 | -10.70 |
| 704354 | 2021-06-06 | 6 | 22 | 6 | 2 | 2021 | 6.95 |
704355 rows × 7 columns
7.1 Visualization of Monthly Time Series
First of all, we group the response variable IMPVENTA by year and month:
Monthly_Time_Series_1 = Time_Series_1.groupby(['Año', 'Mes'])['IMPVENTA'].sum().reset_index(drop=False)Monthly_Time_Series_1| Año | Mes | IMPVENTA | |
|---|---|---|---|
| 0 | 2021 | 6 | 992891.50 |
| 1 | 2021 | 7 | 982142.75 |
| 2 | 2021 | 8 | 885289.16 |
| 3 | 2021 | 9 | 878563.14 |
| 4 | 2021 | 10 | 923552.45 |
| 5 | 2021 | 11 | 1527486.61 |
| 6 | 2021 | 12 | 1438722.72 |
| 7 | 2022 | 1 | 1910816.46 |
| 8 | 2022 | 2 | 1317501.34 |
| 9 | 2022 | 3 | 1524652.47 |
| 10 | 2022 | 4 | 2060509.71 |
| 11 | 2022 | 5 | 2316733.47 |
| 12 | 2022 | 6 | 2872219.23 |
| 13 | 2022 | 7 | 2731251.02 |
| 14 | 2022 | 8 | 1844222.02 |
| 15 | 2022 | 9 | 1068975.95 |
| 16 | 2022 | 10 | 895735.29 |
| 17 | 2022 | 11 | 1544511.61 |
| 18 | 2022 | 12 | 1272814.69 |
This step is necessary to create Month-Year column, which will be used to carry out the plot.
Monthly_Time_Series_1['Año'] = Monthly_Time_Series_1['Año'].astype('string')
Monthly_Time_Series_1['Mes'] = Monthly_Time_Series_1['Mes'].astype('string')
Monthly_Time_Series_1['Mes-Año'] = Monthly_Time_Series_1[['Mes', 'Año']].agg('-'.join, axis=1)Monthly_Time_Series_1| Año | Mes | IMPVENTA | Mes-Año | |
|---|---|---|---|---|
| 0 | 2021 | 6 | 992891.50 | 6-2021 |
| 1 | 2021 | 7 | 982142.75 | 7-2021 |
| 2 | 2021 | 8 | 885289.16 | 8-2021 |
| 3 | 2021 | 9 | 878563.14 | 9-2021 |
| 4 | 2021 | 10 | 923552.45 | 10-2021 |
| 5 | 2021 | 11 | 1527486.61 | 11-2021 |
| 6 | 2021 | 12 | 1438722.72 | 12-2021 |
| 7 | 2022 | 1 | 1910816.46 | 1-2022 |
| 8 | 2022 | 2 | 1317501.34 | 2-2022 |
| 9 | 2022 | 3 | 1524652.47 | 3-2022 |
| 10 | 2022 | 4 | 2060509.71 | 4-2022 |
| 11 | 2022 | 5 | 2316733.47 | 5-2022 |
| 12 | 2022 | 6 | 2872219.23 | 6-2022 |
| 13 | 2022 | 7 | 2731251.02 | 7-2022 |
| 14 | 2022 | 8 | 1844222.02 | 8-2022 |
| 15 | 2022 | 9 | 1068975.95 | 9-2022 |
| 16 | 2022 | 10 | 895735.29 | 10-2022 |
| 17 | 2022 | 11 | 1544511.61 | 11-2022 |
| 18 | 2022 | 12 | 1272814.69 | 12-2022 |
We create the plot:
fig, ax = plt.subplots()
p=sns.lineplot(x="Mes-Año", y="IMPVENTA", data=Monthly_Time_Series_1 , color='red')
plt.setp(p.get_xticklabels(), rotation=90)
plt.title("Monthly Time Series", fontsize = 17)
fig.savefig('p1.jpg', format='jpg', dpi=1200)
plt.show()7.2 Visualization of Daily Time Series
First of all, we group the response variable IMPVENTA by year, month and day:
Daily_Time_Series_1 = Time_Series_1.groupby(['Año', 'Mes','Dia'])['IMPVENTA'].sum().reset_index(drop=False)Daily_Time_Series_1| Año | Mes | Dia | IMPVENTA | |
|---|---|---|---|---|
| 0 | 2021 | 6 | 1 | 26423.78 |
| 1 | 2021 | 6 | 2 | 18752.01 |
| 2 | 2021 | 6 | 3 | 22812.84 |
| 3 | 2021 | 6 | 4 | 107889.11 |
| 4 | 2021 | 6 | 5 | 136714.44 |
| … | … | … | … | … |
| 574 | 2022 | 12 | 27 | 64542.49 |
| 575 | 2022 | 12 | 28 | 59913.84 |
| 576 | 2022 | 12 | 29 | 53815.43 |
| 577 | 2022 | 12 | 30 | 52695.32 |
| 578 | 2022 | 12 | 31 | 39739.67 |
579 rows × 4 columns
This is necessary step to create Month-Year column, which will be used to carry out the plot.
Daily_Time_Series_1['Año'] = Daily_Time_Series_1['Año'].astype('string')
Daily_Time_Series_1['Mes'] = Daily_Time_Series_1['Mes'].astype('string')
Daily_Time_Series_1['Dia'] = Daily_Time_Series_1['Dia'].astype('string')
Daily_Time_Series_1['Dia-Mes-Año'] = Daily_Time_Series_1[['Dia', 'Mes', 'Año']].agg('-'.join, axis=1)Daily_Time_Series_1| Año | Mes | Dia | IMPVENTA | Dia-Mes-Año | |
|---|---|---|---|---|---|
| 0 | 2021 | 6 | 1 | 26423.78 | 1-6-2021 |
| 1 | 2021 | 6 | 2 | 18752.01 | 2-6-2021 |
| 2 | 2021 | 6 | 3 | 22812.84 | 3-6-2021 |
| 3 | 2021 | 6 | 4 | 107889.11 | 4-6-2021 |
| 4 | 2021 | 6 | 5 | 136714.44 | 5-6-2021 |
| … | … | … | … | … | … |
| 574 | 2022 | 12 | 27 | 64542.49 | 27-12-2022 |
| 575 | 2022 | 12 | 28 | 59913.84 | 28-12-2022 |
| 576 | 2022 | 12 | 29 | 53815.43 | 29-12-2022 |
| 577 | 2022 | 12 | 30 | 52695.32 | 30-12-2022 |
| 578 | 2022 | 12 | 31 | 39739.67 | 31-12-2022 |
579 rows × 5 columns
Now, we make the plot:
fig, ax = plt.subplots()
p=sns.lineplot(x="Dia-Mes-Año", y="IMPVENTA", data=Daily_Time_Series_1 , color='red')
p.set_xticks(np.arange(0 , len(Daily_Time_Series_1) , 40))
plt.setp(p.get_xticklabels(), rotation=90)
plt.title("Daily Time Series", fontsize = 20)
fig.savefig('p2.jpg', format='jpg', dpi=1200)
plt.show()7.3 Weekly Time Series
First of all, we group the response variable IMPVENTA by year, month and week:
Weekly_Time_Series_1 = Time_Series_1.groupby(['Año', 'Mes','Semana'])['IMPVENTA'].sum().reset_index(drop=False)Weekly_Time_Series_1| Año | Mes | Semana | IMPVENTA | |
|---|---|---|---|---|
| 0 | 2021 | 6 | 22 | 329029.82 |
| 1 | 2021 | 6 | 23 | 158833.59 |
| 2 | 2021 | 6 | 24 | 201568.21 |
| 3 | 2021 | 6 | 25 | 196610.74 |
| 4 | 2021 | 6 | 26 | 106849.14 |
| … | … | … | … | … |
| 94 | 2022 | 12 | 48 | 89144.45 |
| 95 | 2022 | 12 | 49 | 301484.19 |
| 96 | 2022 | 12 | 50 | 308829.59 |
| 97 | 2022 | 12 | 51 | 262326.49 |
| 98 | 2022 | 12 | 52 | 311029.97 |
99 rows × 4 columns
This is necessary step to create Month-Year column, which will be used to carry out the plot.
Weekly_Time_Series_1['Año'] = Weekly_Time_Series_1['Año'].astype('string')
Weekly_Time_Series_1['Mes'] = Weekly_Time_Series_1['Mes'].astype('string')
Weekly_Time_Series_1['Semana'] = Weekly_Time_Series_1['Semana'].astype('string')
Weekly_Time_Series_1['Semana-Mes-Año'] = Weekly_Time_Series_1[['Semana', 'Mes', 'Año']].agg('-'.join, axis=1)We make the plot:
fig, ax = plt.subplots()
p=sns.lineplot(x="Semana-Mes-Año", y="IMPVENTA", data=Weekly_Time_Series_1 , color='red')
p.set_xticks(np.arange(0 , len(Weekly_Time_Series_1) , 5))
plt.setp(p.get_xticklabels(), rotation=90)
plt.title("Weekly Time Series", fontsize = 17)
fig.savefig('p3.jpg', format='jpg', dpi=1200)
plt.show()7.4 Quarter Time Series
First of all, we group the response variable IMPVENTA by year and quarter:
Quarter_Time_Series_1 = Time_Series_1.groupby(['Año', 'Trimestre'])['IMPVENTA'].sum().reset_index(drop=False)Quarter_Time_Series_1| Año | Trimestre | IMPVENTA | |
|---|---|---|---|
| 0 | 2021 | 2 | 992891.50 |
| 1 | 2021 | 3 | 2745995.05 |
| 2 | 2021 | 4 | 3889761.78 |
| 3 | 2022 | 1 | 4752970.27 |
| 4 | 2022 | 2 | 7249462.41 |
| 5 | 2022 | 3 | 5644448.99 |
| 6 | 2022 | 4 | 3713061.59 |
Quarter_Time_Series_1['Año'] = Quarter_Time_Series_1['Año'].astype('string')
Quarter_Time_Series_1['Trimestre'] = Quarter_Time_Series_1['Trimestre'].astype('string')
Quarter_Time_Series_1['Trimestre-Año'] = Quarter_Time_Series_1[['Trimestre', 'Año']].agg('-'.join, axis=1)Quarter_Time_Series_1| Año | Trimestre | IMPVENTA | Trimestre-Año | |
|---|---|---|---|---|
| 0 | 2021 | 2 | 992891.50 | 2-2021 |
| 1 | 2021 | 3 | 2745995.05 | 3-2021 |
| 2 | 2021 | 4 | 3889761.78 | 4-2021 |
| 3 | 2022 | 1 | 4752970.27 | 1-2022 |
| 4 | 2022 | 2 | 7249462.41 | 2-2022 |
| 5 | 2022 | 3 | 5644448.99 | 3-2022 |
| 6 | 2022 | 4 | 3713061.59 | 4-2022 |
fig, ax = plt.subplots()
p=sns.lineplot(x="Trimestre-Año", y="IMPVENTA", data=Quarter_Time_Series_1 , color='red')
p.set_xticks(np.arange(0 , len(Quarter_Time_Series_1) , 1))
plt.setp(p.get_xticklabels(), rotation=90)
plt.title("Quarter Time Series", fontsize = 17)
fig.savefig('p4.jpg', format='jpg', dpi=1200)
plt.show()7.5 Yearly Time Series
We are going to use a new time series to do a yearly time series visualization.
Time_Series_2 = pd.read_csv('Time_Series_2.csv')Time_Series_2['Fecha'] = pd.to_datetime(Time_Series_2['Fecha'])
Time_Series_2['Dia'] = Time_Series_2['Fecha'].dt.day
Time_Series_2['Semana'] = Time_Series_2['Fecha'].dt.week
Time_Series_2['Mes'] = Time_Series_2['Fecha'].dt.month
Time_Series_2['Trimestre'] = Time_Series_2['Fecha'].dt.quarter
Time_Series_2['Año'] = Time_Series_2['Fecha'].dt.yearTime_Series_2.loc[:,['Importe', 'Fecha', 'Dia', 'Semana', 'Mes', 'Trimestre', 'Año']]| Importe | Fecha | Dia | Semana | Mes | Trimestre | Año | |
|---|---|---|---|---|---|---|---|
| 0 | 120.0 | 2022-11-12 | 12 | 45 | 11 | 4 | 2022 |
| 1 | 25.0 | 2022-11-08 | 8 | 45 | 11 | 4 | 2022 |
| 2 | 25.0 | 2022-11-08 | 8 | 45 | 11 | 4 | 2022 |
| 3 | 57.0 | 2022-11-07 | 7 | 45 | 11 | 4 | 2022 |
| 4 | 245.0 | 2022-11-05 | 5 | 44 | 11 | 4 | 2022 |
| … | … | … | … | … | … | … | … |
| 1017 | 270.0 | 2018-09-17 | 17 | 38 | 9 | 3 | 2018 |
| 1018 | 375.0 | 2018-07-03 | 3 | 27 | 7 | 3 | 2018 |
| 1019 | 45.0 | 2018-06-21 | 21 | 25 | 6 | 2 | 2018 |
| 1020 | 365.0 | 2018-06-20 | 20 | 25 | 6 | 2 | 2018 |
| 1021 | 18.0 | 2018-06-12 | 12 | 24 | 6 | 2 | 2018 |
1022 rows × 7 columns
We group the response variable IMPVENTA by year :
Yearly_Time_Series_2 = Time_Series_2.groupby(['Año'])['Importe'].sum().reset_index(drop=False)Yearly_Time_Series_2| Año | Importe | |
|---|---|---|
| 0 | 2018 | 20584.60 |
| 1 | 2019 | 52704.27 |
| 2 | 2020 | 81690.09 |
| 3 | 2021 | 97683.53 |
| 4 | 2022 | 95286.31 |
We make the plot:
Yearly_Time_Series_2['Año'] = Yearly_Time_Series_2['Año'].astype('string')fig, ax = plt.subplots()
p=sns.lineplot(x="Año", y="Importe", data=Yearly_Time_Series_2 , color='red')
p.set_xticks(np.arange(0 , len(Yearly_Time_Series_2) , 1))
plt.setp(p.get_xticklabels(), rotation=90)
plt.title("Yearly Time Series", fontsize = 17)
fig.savefig('p5.jpg', format='jpg', dpi=1200)
plt.show()8 Time Series Descomposition
Time series decomposition is a process by which we separate a time series into its components: trend, seasonality and residuals.
Trend represents the slow-moving changes in a time series. \(\\[0.25cm]\)
Seasonality represent the seasonal pattern in the series. The cycles occur repeatedly over a fixed period of time. \(\\[0.25cm]\)
Residuals represent the behavior that cannot be explained by the trend and seasonality components. \(\\[0.25cm]\)
8.1 Time Series Descomposition in Python
from statsmodels.tsa.seasonal import seasonal_decompose, STLWeekly_Time_Series_1| Año | Mes | Semana | IMPVENTA | Semana-Mes-Año | |
|---|---|---|---|---|---|
| 0 | 2021 | 6 | 22 | 329029.82 | 22-6-2021 |
| 1 | 2021 | 6 | 23 | 158833.59 | 23-6-2021 |
| 2 | 2021 | 6 | 24 | 201568.21 | 24-6-2021 |
| 3 | 2021 | 6 | 25 | 196610.74 | 25-6-2021 |
| 4 | 2021 | 6 | 26 | 106849.14 | 26-6-2021 |
| … | … | … | … | … | … |
| 94 | 2022 | 12 | 48 | 89144.45 | 48-12-2022 |
| 95 | 2022 | 12 | 49 | 301484.19 | 49-12-2022 |
| 96 | 2022 | 12 | 50 | 308829.59 | 50-12-2022 |
| 97 | 2022 | 12 | 51 | 262326.49 | 51-12-2022 |
| 98 | 2022 | 12 | 52 | 311029.97 | 52-12-2022 |
99 rows × 5 columns
advanced_decomposition = STL(Weekly_Time_Series_1.IMPVENTA, period=4).fit()fig, axs = plt.subplots(nrows=4, ncols=1, sharex=True)
plt.title("Time Series Decomposition", fontsize = 16)
p1=sns.lineplot(advanced_decomposition.observed, color='red', ax=axs[0])
p2=sns.lineplot(advanced_decomposition.trend, color='red', ax=axs[1])
p3=sns.lineplot(advanced_decomposition.seasonal, color='red', ax=axs[2])
p4=sns.lineplot(advanced_decomposition.resid, color='red', ax=axs[3])
p1.set_ylabel('Observed')
p2.set_ylabel('trend')
p3.set_ylabel('seasonal')
p4.set_ylabel('resid')
p1.set_xticks(np.arange(0 , len(Weekly_Time_Series_1) , 10))
plt.setp(p1.get_xticklabels(), rotation=90)
fig.savefig('p6.jpg', format='jpg', dpi=1200)
plt.show()The following graph has superimposed the observed series and the trend:
fig, ax = plt.subplots()
p1=sns.lineplot(x="Semana-Mes-Año", y="IMPVENTA", data=Weekly_Time_Series_1 , color='red')
p2=sns.lineplot(advanced_decomposition.trend, color='blue', linestyle='-', label='Trend')
p1.set_xticks(np.arange(0 , len(Weekly_Time_Series_1) , 10))
plt.setp(p1.get_xticklabels(), rotation=90)
plt.title("Trend and Time Series", fontsize = 17)
fig.savefig('p7.jpg', format='jpg', dpi=1200)
plt.show()9 Time Series Forecasting
Given a stochastic process \(\hspace{0.15cm} \mathcal{Y} \hspace{0.1cm}=\hspace{0.1cm} \Bigl( \hspace{0.06cm} \mathcal{Y}_t \hspace{0.12cm}: \hspace{0.12cm} t \in T=\lbrace 1,2,...,n \rbrace \hspace{0.06cm}\Bigl) \hspace{0.1cm} = \hspace{0.1cm}\Bigl( \hspace{0.06cm} \mathcal{Y}_1 , \mathcal{Y}_2 ,..., \mathcal{Y}_n \hspace{0.06cm}\Bigl) \hspace{0.1cm}\) .
We have a time series \(\hspace{0.07cm}Y_t = \left( y_1, y_2, ...,y_n \right)^t \hspace{0.1cm}\) associated to the process \(\hspace{0.07cm}\mathcal{Y}\hspace{0.07cm}\).\(\\[0.4cm]\)
\(\hspace{0.05cm}y_t\hspace{0.07cm}\) is the value of the process \(\hspace{0.07cm}\mathcal{Y}\hspace{0.07cm}\) in the period \(\hspace{0.07cm}t\).\(\\[0.4cm]\)
Our goal is to predict the value of the process at future periods. For that we will use the observed values of the process in the time series, namely, using the available data. \(\\[0.4cm]\)
We want to predict \(\hspace{0.1cm}y_{n+k}\hspace{0.15cm}\), for \(\hspace{0.07cm}k=1,2,3,\dots\) \(\\[0.4cm]\)
The idea under most part of forecasting methods is to predict \(\hspace{0.1cm}y_{n+k}\hspace{0.1cm}\) as:
\[\widehat{y}_{n+k} \hspace{0.1cm}=\hspace{0.1cm} \widehat{g}(y_1, y_2, ...,y_n)\]
Observations:
We are assuming that the available data is \(\hspace{0.1cm}Y_t = \left( y_1, y_2, ...,y_n \right)^t \hspace{0.1cm}\). \(\\[0.35cm]\)
If \(\hspace{0.1cm}y_n\hspace{0.1cm}\) is the value of the process \(\hspace{0.09cm}\mathcal{Y}\hspace{0.09cm}\) corresponding to most recent period, \(\hspace{0.1cm}y_{n+k}\hspace{0.1cm}\) is the value of the process in a future period. So, by definition, we don’t know \(\hspace{0.1cm}y_{n+k}\hspace{0.1cm}\), for \(\hspace{0.07cm}k=1,2,3,\dots\).
10 Regression Problem vs Time Series Forecasting
You probably have encountered regression problems where you must predict some continuous target given a certain set of features. At first glance, time series forecasting seems like a typical regression problem: we have some historical data, and we wish to build a mathematical expression that will express future values as a function of past values. However, there are some key differences between time series forecasting and regression for time-independent scenarios that deserve to be addressed before we look at our very first forecasting technique.
Time series have an order
The first concept to keep in mind is that time series have an order, and we cannot change that order when modeling. In time series forecasting, we express future values as a function of past values. Therefore, we must keep the data in order, so as to not violate this relationship.
Other regression tasks in machine learning often do not have an order. For example, if you are tasked to predict revenue based on ad spend, it does not matter when a certain amount was spent on ads. Instead, you simply want to relate the amount of ad spend to the revenue. In fact, you might even randomly shuffle the data to make your model more robust. Here the regression task is to simply derive a function such that given an amount on ad spend, an estimate of revenue is returned. On the other hand, time series are indexed by time, and that order must be kept.
Time series sometimes do not have features
It is possible to forecast time series without the use of features other than the time series itself. As data scientists, we are used to having datasets with many columns, each representing a potential predictor for our target. For example, consider the task of predicting revenue based on ad spend, where the revenue is the target variable. As features, we could have the amount spent on Google ads, Facebook ads, and television ads. Using these three features, we would build a regression model to estimate revenue. However, with time series, it is quite common to be given a simple dataset with a time column and a value at that point in time. Without any other features, we must learn ways of using past values of the time series to forecast future values. This is when the moving average model or autoregressive model come into play, as they are ways to express future values as a function of past values
11 Baseline Models
12 Stationary processes
ver tbn lo de Maria dolores
Given a stochastic process \(\hspace{0.1cm}\left\lbrace \hspace{0.1cm} \mathcal{X}_t \hspace{0.1cm}/\hspace{0.1cm} t \in T=\lbrace 1,2,...,k \rbrace \hspace{0.1cm}\right\rbrace \hspace{0.1cm}\)
A stochastic process is strictly stationary if:
the probability distribution of \(\mathcal{X}_{t}\) is the same as that of \(\mathcal{X}_{t+h}\)
for all \(\hspace{0.1cm}t \in \lbrace 1,2,...,k \rbrace\hspace{0.1cm}\) and \(\hspace{0.1cm} h \in \in \lbrace 1,2,... \rbrace\).
Therefore, for all set of times \(\hspace{0.1cm}t_1 , t_2,...,t_n\)
\((\mathcal{X}_{t_1}, \mathcal{X}_{t_2},\dots ,\mathcal{X}_{t_n} )\hspace{0.1cm}\) is identically distributed as \(\hspace{0.1cm}(\mathcal{X}_{t_1+h}, \mathcal{X}_{t_2+h},\dots ,\mathcal{X}_{t_n+h} )\)
Strict stationarity is a very strong condition, since to prove it we must have the joint distributions for any set of variables in the process. A weaker property, but one which is easier to prove, is weak stationarity.
A stochastic process is weakly stationary if:
\(\mu_t = \mu = cte , \forall t \in \lbrace 1,2,...,k \rbrace\)
\(\sigma_t^2 = \sigma = cte , , \forall t \in \lbrace 1,2,...,k \rbrace\)
\(\gamma_{t , t + h} = Cov(\mathcal{X}_t,\mathcal{X}_{t+h}) E[(\mathcal{X}_t - \mu)\cdot (\mathcal{X}_{t+h} - \mu)] = \gamma(h) , \forall h \in \lbrace 0 , \pm 1 , \pm 2 ,... \rbrace\)
The first two conditions indicate that the mean and variance are constant.
The third indicates that the covariance between two variables depends only on their separation.
In a stationary process the autocovariances and autocorrelations depend only on the lag between the variables and, in particular, the relationship between \(\mathcal{X}_t\) and \(\mathcal{X}_{t+h}\) , is always equal to the relationship between \(\mathcal{X}_t\) and \(\mathcal{X}_{t-h}\) .
As a result, in stationary processes:
\(\gamma_{t , t + h} = \gamma_{t + r , t + h + r} = \gamma(h) , \forall r \in \lbrace 0 , \pm 1 , \pm 2 ,... \rbrace\)
\[\rho_{t, t+h} = \dfrac{\gamma_{t , t + h}}{\sqrt{\sigma_t^2 \cdot \sigma_{t+h}^2}} = \dfrac{\gamma(h)}{\sqrt{\sigma^2 \cdot \sigma^2}} = \dfrac{\gamma(h)}{\sigma^2} = \dfrac{\gamma(h)}{\gamma(0)} = \rho(h)\]
Where:
\(\gamma(0) = \sigma^2\)
13 White Noise Process
14 Transformation to get stationarity
ver tbn lo de Maria dolores
15 Testing for stationarity
ver tbn lo de Maria dolores
Dickey-fuller test
Aumented Dickey Fuller test
16 Random walk
\(\hspace{0.25cm}\) A random walk is a stochastic process \(\hspace{0.1cm}\Bigl\{ \hspace{0.1cm} \mathcal{X}_t \hspace{0.1cm}:\hspace{0.1cm} t \in T=\lbrace 1,2,...,n \rbrace \hspace{0.1cm}\Bigl\} \hspace{0.1cm}\) defined as:
- \(\mathcal{X}_t = \mathcal{X}_{t-1} + a_t\) , for \(t \in \lbrace 1,...,n \rbrace\)
\(\hspace{0.25cm}\) where:
\(\mathcal{X}_0 = 0\) \(\\[0.25cm]\)
\(a_t \sim N(0, \sigma^2)\) \(\\[0.25cm]\)
16.1 Aplicar los conceptos dediferencia en parte regular y test de ddickey fuller al random walk
17 Moving Average Process: MA
Para un modelo MA(1), la función de verosimilitud toma la forma:
\[\mathcal{L}(\theta_1, \sigma^2) = \prod_{t=1}^T \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(y_t - \theta_1 \epsilon_{t-1})^2}{2\sigma^2}\right)\]
donde \(\epsilon_t\) es un término de error gaussiano con media cero y varianza \(\sigma^2\), y \(\theta_1\) es un parámetro del modelo. Nuevamente, la función de log-verosimilitud es:
\[\ell(\theta_1, \sigma^2) = -\frac{T}{2} \ln(2\pi) - \frac{T}{2} \ln(\sigma^2) - \frac{1}{2\sigma^2} \sum_{t=1}^T (y_t - \theta_1 \epsilon_{t-1})^2\]
El objetivo es encontrar los valores de los parámetros \(\theta_1\) y \(\sigma^2\) que maximizan la función de log-verosimilitud. Al igual que en los casos anteriores, se puede utilizar el método de Máxima Verosimilitud para estimar los parámetros, encontrando los valores de \(\theta_1\) y \(\sigma^2\) que maximizan \(\ell(\theta_1, \sigma^2)\). Esto también se puede lograr mediante métodos numéricos como el método de Newton-Raphson o el método de gradiente descendente.
Para un modelo MA(q), la función de verosimilitud toma la forma:
\[\mathcal{L}(\theta_1, \ldots, \theta_q, \sigma^2) = \prod_{t=q+1}^T \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(y_t - \sum_{i=1}^q \theta_i \epsilon_{t-i})^2}{2\sigma^2}\right)\]
donde \(\epsilon_t\) es un término de error gaussiano con media cero y varianza \(\sigma^2\), y \(\theta_1, \ldots, \theta_q\) son parámetros del modelo. Nuevamente, la función de log-verosimilitud es:
\[\ell(\theta_1, \ldots, \theta_q, \sigma^2) = -\frac{T-q}{2} \ln(2\pi) - \frac{T-q}{2} \ln(\sigma^2) - \frac{1}{2\sigma^2} \sum_{t=q+1}^T (y_t - \sum_{i=1}^q \theta_i \epsilon_{t-i})^2\]
El objetivo es encontrar los valores de los parámetros \(\theta_1, \ldots, \theta_q\) y \(\sigma^2\) que maximizan la función de log-verosimilitud. Al igual que en los casos anteriores, se puede utilizar el método de Máxima Verosimilitud para estimar los parámetros, encontrando los valores de \(\theta_1, \ldots, \theta_q\) y \(\sigma^2\) que maximizan \(\ell(\theta_1, \ldots, \theta_q, \sigma^2)\). Esto también se puede lograr mediante métodos numéricos como el método de Newton-Raphson o el método de gradiente descendente.
18 Autorregresive Model: AR(p)
Given a stochastic process \(\hspace{0.15cm} \mathcal{Y} \hspace{0.1cm}=\hspace{0.1cm} \Bigl( \hspace{0.06cm} \mathcal{Y}_t \hspace{0.12cm}: \hspace{0.12cm} t \in T=\lbrace 1,2,...,n \rbrace \hspace{0.06cm}\Bigl) \hspace{0.1cm} = \hspace{0.1cm}\Bigl( \hspace{0.06cm} \mathcal{Y}_1 , \mathcal{Y}_2 ,..., \mathcal{Y}_n \hspace{0.06cm}\Bigl) \hspace{0.1cm}\) .
We have a time series \(\hspace{0.07cm}Y_t = \left( y_1, y_2, ...,y_n \right)^t \hspace{0.1cm}\) associated to the process \(\hspace{0.07cm}\mathcal{Y}\hspace{0.07cm}\).\(\\[0.4cm]\)
Where \(\hspace{0.05cm}y_t\hspace{0.07cm}\) is the value of the process \(\hspace{0.07cm}\mathcal{Y}\hspace{0.07cm}\) in the period \(\hspace{0.07cm}t\).\(\\[0.4cm]\)
Our goal is to predict the value of the process at future periods. For that we will use the observed values of the process in the time series, namely, using the available data. \(\\[0.4cm]\)
Los modelos AR (autoregresivos) son un tipo de modelo de series temporales en el que una variable se modela como una función lineal de sus valores pasados. El modelo AR(p) especifica que el valor de la variable en el tiempo t depende linealmente de los p valores anteriores de la variable, es decir,
\[y_t \hspace{0.1cm}=\hspace{0.1cm} c \hspace{0.05cm}+\hspace{0.05cm} \sum_{i=1}^p \phi_i \cdot y_{t-i} \hspace{0.05cm}+\hspace{0.05cm} \varepsilon_t \hspace{0.1cm}=\hspace{0.1cm}c \hspace{0.05cm}+\hspace{0.05cm} \phi_1 \cdot y_{t-1} \hspace{0.05cm}+\hspace{0.05cm} \phi_2 \cdot y_{t-2}\hspace{0.05cm}+\hspace{0.05cm} \cdots \hspace{0.05cm}+\hspace{0.05cm} \phi_p \cdot y_{t-p} \hspace{0.05cm}+\hspace{0.05cm} \varepsilon_t\]
donde:
Se asume que \(\hspace{0.05cm}\varepsilon_t\hspace{0.05cm}\) es una v.a. con \(\hspace{0.05cm}E[\varepsilon_t]=0\hspace{0.05cm}\) y \(\hspace{0.05cm}Var(\varepsilon_t)=\sigma^2\hspace{0.05cm}\) \(\\[0.3cm]\)
\(y_t\hspace{0.05cm}\) es el valor de la variable en el tiempo \(\hspace{0.05cm}t\). \(\\[0.3cm]\)
\(\phi_1, \ldots, \phi_p\hspace{0.05cm}\) son parametros a estimar. Son los coeficientes asociados a cada uno de los \(\hspace{0.05cm}p\hspace{0.05cm}\) retardos del proceso.\(\\[0.3cm]\)
\(c\hspace{0.05cm}\) es una constante.
Estimaciones del pasado:
Suponiendo que la informacion disponible para realizar la estimación del modelo (estimación de parametros) es \(\hspace{0.08cm}Y_t = (y_1,\dots, y_n)^t\hspace{0.08cm}\), tenemos las siguientes ecuaciones de estimación de los valores pasados del proceso: \(\\[0.25cm]\)
\[\widehat{y}_{t} \hspace{0.1cm}=\hspace{0.1cm} c \hspace{0.05cm}+\hspace{0.05cm} \sum_{h=1}^p \hspace{0.05cm}\widehat{\phi}_h \cdot y_{\hspace{0.05cm} t - h} \hspace{0.1cm}=\hspace{0.1cm} c \hspace{0.07cm}+\hspace{0.07cm} \widehat{\phi}_1 \cdot y_{\hspace{0.05cm}t-1} \hspace{0.07cm}+\hspace{0.07cm} \widehat{\phi}_2\cdot y_{\hspace{0.05cm}t-2} \hspace{0.07cm}+\hspace{0.07cm} \cdots \hspace{0.07cm}+\hspace{0.07cm} \widehat{\phi}_p \cdot y_{\hspace{0.05cm}t-p} \hspace{0.25cm} , \hspace{0.25cm} t=1,\dots , n \\[0.55cm]\]
Predicciones del futuro:
Suponiendo que la informacion disponible para realizar la estimación del modelo (estimación de parametros) es \(\hspace{0.08cm}Y_t = (y_1,\dots, y_n)^t\hspace{0.08cm}\), tenemos las siguientes ecuaciones de estimación de los valores pasados del proceso: \(\\[0.25cm]\)
\[\widehat{y}_{n+1} \hspace{0.1cm}=\hspace{0.1cm} \phi_0 \hspace{0.05cm}+\hspace{0.05cm} \sum_{h=1}^p \hspace{0.05cm}\widehat{\phi}_h \cdot y_{(n+1) - h} \hspace{0.1cm}=\hspace{0.1cm} \phi_0 \hspace{0.07cm}+\hspace{0.07cm} \widehat{\phi}_1 \cdot y_{(n+1)-1} \hspace{0.07cm}+\hspace{0.07cm} \widehat{\phi}_2\cdot y_{(n+1)-2} \hspace{0.07cm}+\hspace{0.07cm} \cdots \hspace{0.07cm}+\hspace{0.07cm} \widehat{\phi}_p \cdot y_{(n+1)-p} = \\ \hspace{3cm}=\hspace{0.1cm} \phi_0 \hspace{0.07cm}+\hspace{0.07cm} \widehat{\phi}_1 \cdot y_{n} \hspace{0.07cm}+\hspace{0.07cm} \widehat{\phi}_2\cdot y_{n-1} \hspace{0.07cm}+\hspace{0.07cm} \widehat{\phi}_3\cdot y_{n-2} \hspace{0.07cm}+\hspace{0.07cm} \cdots \hspace{0.07cm}+\hspace{0.07cm} \widehat{\phi}_p \cdot y_{n-(p-1)} \\[0.25cm]\]
\[\widehat{y}_{n+2} \hspace{0.1cm}=\hspace{0.1cm} \phi_0 \hspace{0.07cm}+\hspace{0.07cm} \widehat{\phi}_1 \cdot \widehat{y}_{(n+2)-1} \hspace{0.07cm}+\hspace{0.07cm} \sum_{h=2}^p \hspace{0.05cm} \widehat{\phi}_h \cdot y_{(n+2) - h} \hspace{0.1cm}=\hspace{0.1cm} \phi_0 \hspace{0.05cm}+\hspace{0.05cm} \widehat{\phi}_1 \cdot \widehat{y}_{n+1} \hspace{0.05cm}+\hspace{0.05cm} \widehat{\phi}_2 \cdot y_{n} \hspace{0.05cm}+\hspace{0.05cm} \widehat{\phi}_3\cdot y_{n-1} \hspace{0.05cm}+\hspace{0.05cm} \cdots \hspace{0.05cm}+\hspace{0.05cm} \widehat{\phi}_p \cdot y_{n-(p-2)} \\[0.3cm]\]
\[\widehat{y}_{n+3} \hspace{0.1cm}=\hspace{0.1cm} \phi_0 \hspace{0.07cm}+\hspace{0.07cm} \widehat{\phi}_1 \cdot \widehat{y}_{(n+3)-1} \hspace{0.07cm}+\hspace{0.07cm} \widehat{\phi}_2 \cdot \widehat{y}_{(n+3)-2} \hspace{0.07cm}+\hspace{0.07cm} \sum_{h=3}^p \hspace{0.05cm} \widehat{\phi}_h \cdot y_{(n+3) - h} \hspace{0.1cm}=\hspace{0.1cm} \phi_0 \hspace{0.05cm}+\hspace{0.05cm} \widehat{\phi}_1 \cdot \widehat{y}_{n+2} \hspace{0.05cm}+\hspace{0.05cm} \widehat{\phi}_2 \cdot \widehat{y}_{n+1} \hspace{0.05cm}+\hspace{0.05cm} \widehat{\phi}_3 \cdot {y}_{n} \hspace{0.05cm}+\hspace{0.05cm} \widehat{\phi}_3\cdot y_{n-1} \hspace{0.05cm}+\hspace{0.05cm} \cdots \hspace{0.05cm}+\hspace{0.05cm} \widehat{\phi}_p \cdot y_{n-(p-2)} \\[0.5cm]\]
En general, para \(k=1,2,3,\dots \\\)
\[\widehat{y}_{n+k} \hspace{0.1cm}=\hspace{0.1cm} \phi_0 \hspace{0.07cm}+\hspace{0.07cm} \sum_{h=1}^{k-1} \hspace{0.05cm} \widehat{\phi}_h \cdot \widehat{y}_{(n+k)-h} \hspace{0.07cm}+\hspace{0.07cm} \sum_{h=k}^p \hspace{0.05cm} \widehat{\phi}_h \cdot y_{(n+k) - h} \hspace{0.1cm}=\hspace{0.1cm} \phi_0 \hspace{0.05cm}+\hspace{0.05cm} \widehat{\phi}_1 \cdot \widehat{y}_{n+k-1} \hspace{0.05cm}+\hspace{0.05cm} \widehat{\phi}_2 \cdot \widehat{y}_{n+k-2} \hspace{0.05cm}+\hspace{0.05cm} \dots\hspace{0.05cm}+\hspace{0.05cm} \widehat{\phi}_{k-1} \cdot \widehat{y}_{n+k-(k-1) = n+1} \hspace{0.05cm}+\hspace{0.05cm} \widehat{\phi}_k \cdot \widehat{y}_{n+k-k = n} \hspace{0.05cm}+\hspace{0.05cm} \widehat{\phi}_3\cdot y_{n+k-(k+1)=n-1} \hspace{0.05cm}+\hspace{0.05cm} \widehat{\phi}_4\cdot y_{n+k-(k+2)=n-2} \hspace{0.05cm}+\hspace{0.05cm}\cdots \hspace{0.05cm}+\hspace{0.05cm} \widehat{\phi}_p \cdot y_{n+k-(k+p-2)} \\[0.3cm]\]
Estimación del modelo por máxima verosimilitud:
Para un modelo \(\hspace{0.05cm}AR(p)\hspace{0.05cm}\) con término de error \(\hspace{0.07cm}\varepsilon_t\hspace{0.07cm}\) Gaussiano, es decir, \(\hspace{0.07cm}\varepsilon_t \sim N(0,\sigma^2)\hspace{0.07cm}\) , se asume la siguiente relación para cada variable del proceso:
\[\mathcal{Y}_t \hspace{0.1cm}=\hspace{0.1cm} \phi_0 + \sum_{h=1}^p \phi_h \cdot y_{t-h} + \varepsilon_t\]
Como el termino de error es Gaussiano, tal y como se especificó antes, se deduce que:
\[\mathcal{Y}_t \sim N(\phi_0 + \sum_{h=1}^p \phi_h \cdot y_{t-h} + \varepsilon_t , \sigma^2)\]
para cada \(t=1,\dots , n\)
Por tanto, la función de verosimilitud del modelo es:
\[\mathcal{L}(\phi_0,\phi_1,\dots, \phi_p, \sigma^2) \hspace{0.07cm}=\hspace{0.07cm} f_{\mathcal{Y}_1,...,\mathcal{Y}_n}(y_{p+1},y_{p+2},...,y_n) \hspace{0.07cm}=\hspace{0.07cm} \prod_{t=p+1}^n f_{\mathcal{Y}_t}(y_t) \hspace{0.07cm}=\hspace{0.07cm} \prod_{t=p+1}^n \frac{1}{\sqrt{2\pi\sigma^2}} \cdot \exp\left(-\frac{(y_t - \phi_0 + \sum_{i=1}^p \phi_i y_{t-i})^2}{2\sigma^2}\right)\]
donde \(\boldsymbol{\phi} = (\phi_1, \ldots, \phi_p)\) es un vector de parámetros, y \(y_t\) es una serie de tiempo para \(t=1, \ldots, T\). De nuevo, la función de log-verosimilitud es:
\[ \ell(\boldsymbol{\phi}, \sigma^2) = -\frac{T-p}{2} \ln(2\pi) - \frac{T-p}{2} \ln(\sigma^2) - \frac{1}{2\sigma^2} \sum_{t=p+1}^T (y_t - \sum_{i=1}^p \phi_i y_{t-i})^2 \]
El objetivo es encontrar los valores de los parámetros \(\boldsymbol{\phi}\) y \(\sigma^2\) que maximizan la función de log-verosimilitud. Al igual que en el caso AR(1), se puede utilizar el método de Máxima Verosimilitud para estimar los parámetros, encontrando los valores de \(\boldsymbol{\phi}\) y \(\sigma^2\) que maximizan \(\ell(\boldsymbol{\phi}, \sigma^2)\). Esto también se puede lograr mediante métodos numéricos como el método de Newton-Raphson o el método de gradiente descendente.
Para obtener predicciones para \(y_t\) a partir de las predicciones para \(w_t\), debemos aplicar la transformación inversa de la que se usó para obtener \(w_t\) a partir de \(y_t\). En este caso, dado que \(w_t = (1-B) \log(y_t)\), la transformación inversa es:
\[ y_t = e^{w_t}(1+B+B^2+\dots) \]
donde \(B\) es el operador de desfase (backward shift operator). La expresión \((1+B+B^2+\dots)\) se puede simplificar como \((1-B)^{-1}\). Entonces, las predicciones para \(y_t\) son:
\[ \hat{y}_t = e^{\hat{w}_t}(1-B)^{-1} \]
donde \(\hat{w}_t\) son las predicciones obtenidas por el modelo AR(2) para \(w_t\). Nota que \((1-B)^{-1}\) puede ser expresado como una serie geométrica infinita:
\[ (1-B)^{-1} = 1 + B + B^2 + B^3 + \dots \]
Por lo tanto, las predicciones para \(y_t\) pueden ser escritas como:
\[ \hat{y}_t = e^{\hat{w}_t}(1 + B + B^2 + B^3 + \dots) \]
Esto significa que se pueden usar las predicciones del modelo AR(2) para \(w_t\) y la ecuación anterior para obtener las predicciones para \(y_t\).
En Python, una forma de aplicar la ecuación \(\hat{y}_t = e^{\hat{w}_t}(1 + B + B^2 + B^3 + \dots)\) es utilizando la librería Statsmodels, que tiene una función llamada ARIMA que permite ajustar modelos ARIMA y obtener predicciones.
Primero, necesitamos ajustar un modelo ARIMA(2,0,0) al conjunto de entrenamiento de la serie \(w_t\). Para ello, podemos hacer lo siguiente:
python Copy code import numpy as np import pandas as pd import statsmodels.api as sm
19 Crear la serie w_t y dividirla en conjuntos de entrenamiento y prueba
w_t = np.log(y_t).diff().dropna() # asumiendo que ya se tiene la serie y_t train_w = w_t[:n_train] test_w = w_t[n_train:]
20 Ajustar un modelo ARIMA(2,0,0) al conjunto de entrenamiento
model = sm.tsa.ARIMA(train_w, order=(2, 0, 0)).fit() Luego, podemos usar la función forecast del modelo para obtener predicciones para la serie \(w_t\) en el conjunto de prueba:
python Copy code # Obtener predicciones para la serie w_t en el conjunto de prueba preds_w = model.forecast(steps=len(test_w))[0] Finalmente, podemos aplicar la ecuación \(\hat{y}_t = e^{\hat{w}_t}(1 + B + B^2 + B^3 + \dots)\) para obtener predicciones para la serie \(y_t\):
python Copy code # Aplicar la ecuación para obtener predicciones para la serie y_t preds_y = np.exp(preds_w) * np.sum([model.params[i+1] for i in range(model.k_ar)]) En esta última línea, np.sum([model.params[i+1] for i in range(model.k_ar)]) calcula la suma \((1 + B + B^2 + B^3 + \dots)\) y np.exp(preds_w) aplica la transformación exponencial a las predicciones para \(w_t\).
que es model.k_ar En el código que presenté anteriormente, model.k_ar es un atributo del modelo ARIMA ajustado por Statsmodels y representa el número de términos autoregresivos (AR) en el modelo. En este caso, se ajustó un modelo ARIMA(2,0,0), por lo que model.k_ar es igual a 2.
La suma \((1 + B + B^2 + B^3 + \dots)\) en la ecuación \(\hat{y}_t = e^{\hat{w}_t}(1 + B + B^2 + B^3 + \dots)\) se puede expresar como una serie geométrica infinita, pero se puede truncar a un número finito de términos. En este caso, se están incluyendo todos los términos autoregresivos en el modelo, que son dos. Por lo tanto, la suma se calcula como:
python Copy code np.sum([model.params[i+1] for i in range(model.k_ar)]) donde model.params[i+1] devuelve el valor del i-ésimo coeficiente autoregresivo, que está almacenado en el atributo params del modelo.
Fabio Scielzo Ortiz Es decir \((1 + B + B^2 + B^3 + \dots)\) puede expresarse como \(\hat{\phi}_1 + \hat{\phi}_2\) , siendo los sumandos los parametros estimados del modelo AR(2) ?? Sí, en este caso se puede expresar la suma \((1 + B + B^2 + B^3 + \dots)\) como \(\hat{\phi}_1 + \hat{\phi}_2\), donde \(\hat{\phi}_1\) y \(\hat{\phi}_2\) son los coeficientes autoregresivos estimados del modelo AR(2).
Para entender por qué esto es así, podemos escribir la suma como:
\[(1 + B + B^2 + B^3 + \dots) = \frac{1}{1 - B} = \frac{(1 - \phi_1 B - \phi_2 B^2 - \dots)}{(1 - \phi_1 B - \phi_2 B^2 - \dots)}\]
donde \(\phi_1\) y \(\phi_2\) son los verdaderos coeficientes autoregresivos del modelo AR(2). Si el modelo está bien especificado y los residuos son ruido blanco, entonces los estimadores \(\hat{\phi}_1\) y \(\hat{\phi}_2\) convergerán a los verdaderos valores \(\phi_1\) y \(\phi_2\) a medida que aumenta el tamaño de la muestra.
Por lo tanto, si se usan los estimadores \(\hat{\phi}_1\) y \(\hat{\phi}_2\) en lugar de los verdaderos valores \(\phi_1\) y \(\phi_2\), se puede aproximar la suma \((1 + B + B^2 + B^3 + \dots)\) como \(\hat{\phi}_1 + \hat{\phi}_2\). Esta aproximación puede ser lo suficientemente buena para obtener predicciones precisas de la serie original \(y_t\), especialmente si el modelo AR(2) es una buena representación de los datos.
Fabio Scielzo Ortiz No veo el por que de esto \[(1 + B + B^2 + B^3 + \dots) = \frac{1}{1 - B} = \frac{(1 - \phi_1 B - \phi_2 B^2 - \dots)}{(1 - \phi_1 B - \phi_2 B^2 - \dots)}\]
No veo el por que de esto \[(1 + B + B^2 + B^3 + \dots) = \frac{1}{1 - B} = \frac{(1 - \phi_1 B - \phi_2 B^2 - \dots)}{(1 - \phi_1 B - \phi_2 B^2 - \dots)}\]
Tienes razón, la expresión que puse no es del todo precisa. Lo que quería decir es que \((1-B)\) es un operador de desplazamiento hacia atrás, que se puede escribir como una función de transferencia de la forma:
\[H(B) = \frac{1}{1 - \phi_1 B - \phi_2 B^2 - \dots}\]
donde \(\phi_1\) y \(\phi_2\) son los coeficientes autoregresivos verdaderos del modelo AR(2). Esta función de transferencia se puede utilizar para representar la relación entre \(y_t\) y sus valores rezagados \(y_{t-1}\), \(y_{t-2}\), etc.
Al aplicar esta función de transferencia a la serie diferenciada \(w_t\), se obtiene:
\[\frac{w_t}{H(B)} = \frac{(1 - B) \log(y_t)}{(1 - \phi_1 B - \phi_2 B^2)}\]
Si se resuelve para \(\log(y_t)\), se obtiene:
\[\log(y_t) = H(B) w_t + \phi_1 \log(y_{t-1}) + \phi_2 \log(y_{t-2}) + \dots\]
Esta ecuación muestra que \(\log(y_t)\) puede expresarse como una combinación lineal de \(w_t\) y sus valores rezagados, donde los coeficientes son la función de transferencia \(H(B)\) y los coeficientes autoregresivos \(\phi_1\) y \(\phi_2\). Por lo tanto, si se reemplaza \(w_t\) por su pronóstico \(\hat{w}_t\) y se truncan los términos autoregresivos después del segundo término (que es el orden del modelo AR utilizado para ajustar \(\hat{w}_t\)), se obtiene la expresión:
\[\log(\hat{y}_t) = \hat{w}_t + \hat{\phi}1 \log(y{t-1}) + \hat{\phi}2 \log(y{t-2})\]
que se puede transformar para obtener la predicción de la serie original \(y_t\):
\[\hat{y}_t = e^{\hat{w}_t}(1 + \hat{\phi}_1 B + \hat{\phi}2 B^2) y{t-1}\]
Esta es una aproximación de la predicción exacta, pero puede ser lo suficientemente buena para fines prácticos si el modelo AR es una buena representación de los datos.